Rare genomic copy number variants implicate new candidate genes for bicuspid aortic valve

Bicuspid aortic valve (BAV), the most common congenital heart defect, is a major cause of aortic valve disease requiring valve interventions and thoracic aortic aneurysms predisposing to acute aortic dissections. The spectrum of BAV ranges from early onset valve and aortic complications (EBAV) to sporadic late onset disease. Rare genomic copy number variants (CNVs) have previously been implicated in the development of BAV and thoracic aortic aneurysms. We determined the frequency and gene content of rare CNVs in EBAV probands (n = 272) using genome-wide SNP microarray analysis and three complementary CNV detection algorithms (cnvPartition, PennCNV, and QuantiSNP). Unselected control genotypes from the Database of Genotypes and Phenotypes were analyzed using identical methods. We filtered the data to select large genic CNVs that were detected by multiple algorithms. Findings were replicated in a BAV cohort with late onset sporadic disease (n = 5040). We identified 3 large and rare (< 1,1000 in controls) CNVs in EBAV probands. The burden of CNVs intersecting with genes known to cause BAV when mutated was increased in case-control analysis. CNVs intersecting with GATA4 and DSCAM were enriched in cases, recurrent in other datasets, and segregated with disease in families. In total, we identified potentially pathogenic CNVs in 9% of EBAV cases, implicating alterations of candidate genes at these loci in the pathogenesis of BAV.


Introduction
Copy number variants (CNVs) have been implicated as causes or modifiers of many human diseases [1].Specifically, large genomic CNVs are significantly enriched in cohorts with developmental delay or congenital abnormalities, and the severity of phenotypes has been correlated with the burden of rare CNVs [2].These observations show that large, rare, de novo CNVs are likely to be pathogenic and can exert clinically relevant effects on disease pathogenesis [3,4].
The worldwide prevalence of congenital heart disease (CHD) is 8.2 per 1000 live births [5].CNVs have been implicated in both syndromic and non-syndromic forms of CHD [6][7][8][9][10].The pathogenicity and penetrance of CNVs was initially established for clinical syndromes such as velocardiofacial syndrome, Turner syndrome, or Williams-Beuren syndrome, which involve chromosomal or megabase scale duplications or deletions, but has since been expanded to include additional CHD subtypes [10].CNVs contribute to 10% of all CHD cases and up to 25% of cases with extracardiac anomalies or other syndromic features [11].The role of pathogenic CNVs affecting genes that are known to cause CHD when mutated, such as ELN and TBX1, has been established [12].Furthermore, population-level analysis has consistently demonstrated an increased burden of CHD in carriers of CNVs at specific genomic hotspots compared to controls, displaying the pathogenic potential of rare or de novo CNVs [12][13][14].
Bicuspid aortic valve (BAV) is the most common congenital heart malformation with a population prevalence of 0.5-2% [15].BAV predisposes to aortic valve stenosis and thoracic aortic aneurysms and is associated with other left ventricular outflow tract lesions such as mitral valve disease and coarctation [16].The high heritability of BAV was demonstrated in first-and second-degree relatives, who are more than ten times more likely to be diagnosed with BAV compared to matched controls [17].BAV can occur as an isolated congenital lesion or as part of a clinical syndrome.For example, the prevalence of BAV is increased in Velocardiofacial, Loeys-Dietz, Kabuki, and Turner syndromes.Pathogenic variants of several genes are implicated in familial non-syndromic BAV, which is typically inherited as an autosomal dominant trait with reduced penetrance and variable expressivity.There is strong cumulative evidence that pathogenic variants in GATA4, GATA6, MIB1, NOTCH1, ROBO4, SMAD4, and SMAD6 each contribute to a small percentage of non-syndromic BAV cases [18,19].Phenotypic expression of BAV disease ranges from incidental discovery in late adulthood to neonatal or childhood onset with severe manifestations requiring valve or aortic interventions.In comparison to patients with later disease onset, younger BAV cohorts tend to present with syndromic features or complex congenital malformations that are more likely to have a genetic cause, thereby increasing the power of association studies to discover clinically relevant CNVs [20].Recently, we identified recurrent rare CNVs that were enriched for cardiac developmental genes in a young cohort with early-onset thoracic aortic aneurysms or acute aortic dissections [21].
We hypothesize that large rare genomic CNVs contribute to early onset complications of BAV.Consistent with previous observations, we predict that the burden and penetrance of rare CNVs will be increased in individuals with early onset disease when compared to elderly sporadic BAV cases and population controls.Identification of novel pathogenic CNVs can provide new insights into the genetic complexity of BAV and may be useful for personalized risk stratification or clinical guidance based on the specific recurrent CNV (Fig 1) [22].Therefore, we set out to describe the burden and penetrance of rare CNVs in a young cohort with early onset complications of BAV disease (EBAV).

Materials and methods
The study protocol was approved by the Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston (HSC-MS-11-0185).Study recruitment began on July 1, 2017, and concluded on March 30, 2022.After written informed consent, we enrolled probands with early onset BAV disease (EBAV), which we defined as individuals with BAV who were under the age of 30 at the time of first clinical event.Clinical events were defined as aortic replacement, aortic valve surgery, aortic dissection, moderate or severe aortic stenosis or aortic regurgitation, large aneurysm (Z > 4.5), or intervention for BAV-related conditions.Those with hypoplastic left heart, known genetic mutations, genetic syndromes, or complex congenital heart disease were excluded.Samples were collected and genotyped as previously reported [23].For comparison, we analyzed a cohort of older individuals of European ancestry with sporadic BAV disease selected from the International BAV Consortium (BAVGWAS) [24].
Phenotypes were derived from record review with confirmation of image data whenever possible [25,26].The computational pipeline for CNV analysis of Illumina single nucleotide polymorphism (SNP) array data included three independent CNV detection algorithms (Fig 2, S1 Appendix).
GenomeStudio was used to exclude samples with indeterminate sex or more than 5% missing genotypes, and single nucleotide polymorphisms (SNPs) with GenTrain = 0. Principal component analysis was used to remove outliers that did not cluster with European ancestry.Prior to CNV analysis, each dataset was trimmed by selecting a common set of 650,000 SNPs that were genotyped on each of the microarrays used in this study.
Three independent algorithms (PennCNV, cnvPartition, and QuantiSNP) were used to generate CNV calls and sample-level quality statistics from SNP intensity data.PennCNV and QuantiSNP were run on Unix clusters and cnvPartition data were exported from GenomeStudio.The analysis was run using default configurations.
PennCNV was used to generate QC data and remove CNV calls that intersect with polymorphic genomic regions.Samples that met any of the following criteria were excluded, standard deviation of the LogR ratio (obtained from PennCNV) > 0.35 or number of CNVs > 2 standard deviations above the mean for each data set.CNV calls less than 20 Kilobases in length and/or spanned by fewer than 6 probes were excluded.The overlap function for rare CNVs in PLINK was used to construct CNV regions (CNVRs) after adjacent regions were merged using PennCNV.
LogR ratio (LRR) and B allele frequency (BAF) data at CNVRs and calls of interest were visualized in GenomeStudio for validation.For segregation analysis, GenomeStudio was used to determine the presence of CNVs in relatives.
A total of 22,014 unselected control Illumina Genotypes obtained from the Database of Genotypes and Phenotypes were analyzed using identical methods (S1 Table ).The Wisconsin Longitudinal Study (WLS) includes data on a cohort of 10,300 individuals who graduated from Wisconsin high schools in 1957.The Health and Retirement Study (HRS) includes data on 37,000 individuals aged 50 above from 23,000 households across the United States.Principal component analysis was used to select European ancestry genotypes from these datasets for  [27], cnvPartition, and QuantiSNP [28]) were used to generate initial CNV calls and sample-level statistics.Sample-level quality control analysis was performed using PennCNV.PLINK [29] was used to define CNV regions for subsequent burden, enrichment, and replication tests.Raw CNV calls were individually screened for CNVs intersecting with genes implicated in BAV and enriched in casecontrol tests.CNVs were validated by examining the raw data in GenomeStudio.https://doi.org/10.1371/journal.pone.0304514.g002analysis.Datasets were paired for case-control analysis based on the concordance of log-transformed sample-level quality control statistics (number of CNV calls and standard deviation of logR ratios).Chi-squared or Fisher exact tests were used to compare CNV frequencies in cases and controls.
Rare CNV functions in PLINK (v1.7) were used to perform permutation-based burden tests or gene set-based enrichment tests as previously described [29,30].Case control burden tests were restricted to CNVs that were longer than 110 Kb and less than 0.1% in frequency.CNV overlap functions in PLINK were used to identify rare CNVs that intersect between datasets or involve specific BAV or CHD genes (S2 Table ).The list of candidate genes included 190 CHD genes that have strong cumulative evidence to cause BAV or related congenital malformations from human or animal model data [31][32][33].Genome Reference Consortium Human Build 37 was used for CNV annotation [34].

Results
The EBAV cohort included a total of 544 samples, 272 EBAV probands, 21 relatives with BAV, and 251 apparently unaffected family members (26 trios and 15 multiplex families).The BAVGWAS sample contained 5,040 genotypes with associated demographic and clinical data.After exclusions due to data quality control or missing phenotypic data, 499 EBAV genotypes and 4,216 BAVGWAS were included in the final analysis.In phenotypic comparisons, EBAV probands were significantly younger at diagnosis, had more frequent co-existing congenital heart and vascular lesions, and underwent more frequent valve or aortic interventions (Table 1).
In comparisons between EBAV and WLS data, the rate of large CNVs was increased in EBAV cases compared to WLS controls, driven primarily by enrichment of large rare genomic deletions (P<0.001,Table 2, S3 Table ).
In BAVGWAS cases, there were no significant differences in CNV rates or the proportions of individuals with large rare CNVs in comparison to HRS controls (Table 3).
We discovered 32 large (>250 kb) and rare (<1:1000) CNVs in EBAV probands, and 26 of these CNVs included protein-coding genes (S4 Table ).The overall burden of large rare genic CNVs was not different between EBAV cases and WLS controls.However, the burden of large rare genic CNVs intersecting with genes known to cause BAV when mutated or implicated in syndromic BAV was significantly increased in EBAV cases (Table 4).EBAV probands with large rare CNVs were more likely to have concomitant congenital heart lesions and have other family members who were diagnosed with BAV (S5 Table ).We also scrutinized candidate genomic regions that are implicated in CHD by analyzing data from individual CNV algorithms to detect copy number alterations that may have been filtered out due to strict quality control criteria prior to enrichment studies.We identified additional rare EBAV CNVs that intersect with CHD candidate genes CELSR1, GJA5, RAF1, LTBP1, KIF1A, MYH11, TTN, and the VCFS region in 22q11.2.We detected additional GATA4 and DSCAM CNVs in multiplex families.These CNVs were enriched in EBAV cases compared to WLS controls (Table 5, S10 Table ).In total, 9% of EBAV probands had CNVs that are likely to contribute to the development of BAV (S11 Table ).
We similarly scrutinized BAVGWAS data for additional CNVs that intersect with CHD genes.We found that large duplications involving SOX7 and GATA4 in 8p23 and the VCFS region in 22q11.2 were also significantly enriched in BAVGWAS cases compared to HRS controls (Table 6, S12 Table ).
Next, we attempted to replicate our observations by identifying CNVs in the BAVGWAS dataset that overlapped with EBAV CNVs.Large rare CNVs intersecting with GATA4, DSCAM, CELSR1, GJA5, MYH11, KIF1A, and TBX1 overlapped between the EBAV and BAVGWAS datasets (S13 Table ).However, only CNVs intersecting with GATA4 and DSCAM were significantly enriched in both datasets (Fig 3).We also identified 21 very large genomic CNVs > 5 Mb in length in the BAVGWAS dataset.Analysis of GenomeStudio data showed that most of these were mosaic loss of heterozygosity regions or duplications.Nine were large germline chromosome-scale aberrations, including two cases of trisomy 21 (S14 Table ).We did not identify any large X chromosome copy variants that may be consistent with Turner syndrome.There were no megabase-scale copy number variants in the EBAV dataset.
Pedigree analysis showed that CNVs involving CELSR1, LTBP1, KIF1A, GATA4, and DSCAM segregate with BAV in EBAV families (S1 Fig) .Most of these CNVs occurred de novo in probands and were not found in unaffected family members.CNV carriers tended to present due to moderate or severe aortic regurgitation requiring valvular surgery.One proband had aortic coarctation.The age at presentation or sex of individuals with rare CNVs was not significantly different from the rest of the EBAV cohort.

Discussion
We identified large, rare, and likely pathogenic CNVs in almost 10% of EBAV probands that are enriched in genes that cause BAV when mutated.The percentage of EBAV cases with likely pathogenic CNVs is similar to our previous observations in a cohort with early onset TAD [36].Enrichment of CNVs involving GATA4 and DSCAM in EBAV cases replicated in two additional BAV datasets and thousands of unselected control genotypes.This analysis provides compelling evidence that rare CNVs collectively contribute to more BAV cases than any single mutated gene.GATA-binding protein 4 (GATA4) is a transcription factor that is required for cardiac and neuronal differentiation during embryogenesis [37].Mutations of GATA4 and its homologs GATA5 and GATA6 cause congenital heart lesions [38].Mutations in the GATA4 gene have been linked to a range of congenital heart diseases in humans, such as cardiac septal defects, tetralogy of Fallot, and patent ductus arteriosus [39].Patients with BAV who have rare functional variants in the GATA family exhibit varying degrees of aortopathy expression, including aortic aneurysm, dissection, and/or aortic stenosis.Alonso-Montes et al. described 4 predicted deleterious GATA4 mutations in 122 non-syndromic BAV probands who did not have affected relatives [40].Rare GATA4 deletions and putative loss of function mutations are also implicated in CHD with distinctive features, underlining the importance of GATA4 dosage to cardiac development [41,42].Glessner et al. discovered large de novo duplications involving GATA4 in CHD trios with conotruncal defects or left ventricular outflow tract obstructive lesions [43].Some duplications were inherited from apparently unaffected parents.Zogopoulos and Yu described similar genomic duplications in unaffected individuals and in unselected control genotypes [44,45].
These observations are consistent with low-penetrance CHD in GATA4 duplication carriers.Similar to other complex and multifactorial disorders, CHD pathogenesis is likely caused by the cumulative impact of multiple CNVs or mutations, each exerting small to moderate effects to collectively disrupt cardiac development.For example, the frequency of congenital heart lesions is increased in individuals who have both 22q11.2deletions and a common 12p13.31duplication involving the SLC2A3 gene.The SLC2A3 CNV likely functions as a modifier of the cardiac phenotype associated with 22q11 deletion syndrome, exemplifying a "twohit" model [46].
More than half of patients with Down syndrome have congenital heart malformations due to the interaction of multiple dosage-sensitive CHD genes on chromosome 21 [47][48][49].Down syndrome cell adhesion molecule (DSCAM), previously shown to play a critical role in neurogenesis, has also been implicated in the pathophysiology of CHD [50].Analysis of rare segmental trisomies of chromosome 21 suggested that duplication of DSCAM and the contiguous COL6A1 and COL6A2 genes may cause septal abnormalities and other Down Syndrome- related CHD lesions, including BAV.Overexpression of DSCAM and COL6A2 causes cardiac malformations in mice [51].Our findings suggest that rare CNVs involving DSCAM may contribute to some non-syndromic BAV cases.GATA4 and DSCAM CNVs segregated with disease in multiple families but are not fully penetrant and were detected in some unaffected relatives.Intriguingly, large 22q11.2,GATA4 and DSCAM CNVs were more highly enriched in EBAV than in BAVGWAS cases, suggesting that these CNVs may drive early onset BAV disease.These results are consistent with our observation that pathogenic CNVs involving candidate BAV genes are also enriched in EBAV compared to BAVGWAS cases.Our data suggests that pathogenic CNVs at these loci may predict accelerated disease onset or more severe complications.
We also identified recurrent rare CNVs involving specific dosage-sensitive cardiac developmental genes that are implicated in non-syndromic CHD.Recurrent 1q21.1 distal deletions encompassing GJA5, the gene encoding Connexin-40, are associated with CHD lesions including BAV.Enrichment of small genomic duplications spanning the GJA5 gene in cohorts with tetralogy of Fallot and cardiac abnormalities in mice with a targeted GJA5 deletion imply that dosage variations of GJA5 contribute to CHD [52].CELSR1, a cadherin superfamily member, is mutated in families with BAV and hypoplastic left heart syndrome [53].LTBP1 encodes an extracellular matrix protein that regulates TGF-β and fibrillin and has been implicated in congenital heart lesions [54].KIF1A, encoding a kinesin microtubule transporter, was implicated in a dominant multisystem syndromic disorder with valvular and cardiac defects [55].Mutation of MYH11 causes familial thoracic aortic aneurysms and dissections with an increased prevalence of BAV [56].TTN mutations cause dilated cardiomyopathy and are associated with other left-sided congenital lesions [57].Mutations or copy number changes involving these genes all cause a wide spectrum of penetrance and phenotypic severity, consistent with sensitivity to genetic or clinical modifiers.
Our combinatorial analysis method eliminated many CNVs that were detected by single algorithms or did not meet quality control benchmarks.Therefore, our analysis likely underestimated the contribution of rare pathogenic CNVs to BAV.We also recognize that cardiac development involves the complex interaction of many genes.We selectively validated individual CNVs at loci of interest but may have underrepresented CNVs that had no a priori relationship with CHD.The apparent penetrance of some CNVs may be less than expected due to missing phenotypic information.The available clinical data was not sufficiently detailed to permit genotype-phenotype correlations with specific CHD clinical features.
In conclusion, we identified large rare CNVs in a significant proportion of BAV cases, including a subset of CNVs that may predict early onset complications of BAV disease.These observations add to the evidence that rare CNVs may eventually have clinical utility for risk stratification and personalized disease management (

Fig 2 .
Fig 2.Overview of pipeline for CNV identification and validation.SNP, single nucleotide polymorphism; QC, Quality control; CNV, copy number variant.Illumina B-allele frequency and signal intensity data was trimmed and exported using GenomeStudio.Three different algorithms (PennCNV[27], cnvPartition, and QuantiSNP[28]) were used to generate initial CNV calls and sample-level statistics.Sample-level quality control analysis was performed using PennCNV.PLINK[29] was used to define CNV regions for subsequent burden, enrichment, and replication tests.Raw CNV calls were individually screened for CNVs intersecting with genes implicated in BAV and enriched in casecontrol tests.CNVs were validated by examining the raw data in GenomeStudio.

Table 1 . Characteristics of EBAV and BAVGWAS probands.
In region-based association tests, 55 genes spanned by 26 CNVs were enriched in the EBAV cohort compared to WLS controls (S6 and S7 Tables).The largest CNVs involved DSCAM in 21q22 and GATA4 in 8p23.Large duplications involving the Velocardiofacial (VCFS) region in 22q11.2 and a recurrent CHD-associated CNV region in 1q21.1 were also enriched in EBAV probands.In BAVGWAS cases, common CNVs involving NANOG in 12p13.31and rare CNVs involving NIBPL in 5p13.2, which are essential for early heart development, are enriched in comparison to HRS controls (S8 and S9 Tables).NIBPL mutations cause Cornelia-de Lange syndrome with a spectrum of congenital heart malformations including BAV.

Table 3 . Burden analysis of BAVGWAS CNVs.
Large, CNV regions between 250 Kb and 5 Mb in length; Rare, occur in fewer than 1 in 1000 individuals; RATE, number of CNVs per individual; PROP, proportion of samples with one or more CNVs; TOT, total length of all CNVs in kilobases; AVG, mean CNV length; P, permuted P value.Tests are 1-sided with 100,000 permutations.https://doi.org/10.1371/journal.pone.0304514.t003

Table 4 . Burden of rare EBAV CNVs.
Calls, total number of CNVs that met the specified criteria; Rate, number of CNVs per individual; RR, relative risk; P, P value; Genic, CNVs that intersect with genes; BAV, CNVs that intersect with genes that are known to cause bicuspid aortic valve (BAV) when mutated or implicated in syndromic BAV; Total, total number of large, rare CNVs.P values were calculated using 100,000 permutations.

Table 5 . CNVs affecting congenital heart disease genes in EBAV.
Region, hg38 coordinates corresponding to the minimum overlap region of CNVs; Genes, candidate genes in region; Case, number of rare CNVs in EBAV cases that intersect with region; Control, number of CNVs in WLS controls that intersect with region; OR, odds ratio; P, chi-squared P-value; 95% CI, 95% confidence interval.Inherited CNVs were only counted once.https://doi.org/10.1371/journal.pone.0304514.t005

Table 6 . CNVs affecting congenital heart disease genes in BAVGWAS.
Region, hg38 coordinates corresponding to the minimum overlap region of CNVs; Genes, candidate genes in region; Case, number of rare CNVs in EBAV cases that intersect with region; Control, number of CNVs in WLS controls that intersect with region; OR, odds ratio; P, chi-squared P-value; 95% CI, 95% confidence interval.Inherited CNVs were only counted once. https://doi.org/10.1371/journal.pone.0304514.t006